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EXECUTIVE  SUMMARY 


INTRODUCTION 

One  of  the  major  objectives  of  the  Future  Combat  System  family  of  military 
vehicles  is  to  achieve  high  reliability  and  operational  availability.  The  proposed  means  of 
achieving  this  objective  is  a  prognostics-based  approach  to  maintenance.  That  is,  based 
on  prediction  of  remaining  life,  parts  vulnerable  to  failure  are  replaced  just  before  they 
fail  or  before  an  upcoming  mission.  This  paper  presents  analysis  results  that  show  how 
the  prognostics  maintenance  strategy  affects  vehicle  reliability  and  operational 
availability,  contrasts  this  approach  with  traditional  maintenance  approaches  (replacing 
the  parts  after  they  break  and  at  set  time  intervals),  and  underscores  benefits  and 
limitations  of  each  strategy. 

METHOD 

In  simultaneous-replacement-based  maintenance,  parts  are  replaced  at  set 
intervals,  regardless  of  their  condition.  We  evaluate  the  effect  of  replacement  frequency 
on  operational  availability  and  costs  (which  are  assumed  to  be  proportional  to  the  number 
of  spare  parts  required  to  maintain  the  system).  We  assume  that  the  prognostic  capability 
exists  for  a  fraction  of  all  critical  parts  (in  the  analysis,  this  fraction  is  treated  as  a 
parameter  varying  from  0  to  1).  We  assess  the  effect  of  this  strategy  on  operational 
availability  and  associated  costs  in  terms  of  the  number  of  spare  parts  required. 

To  evaluate  and  compare  these  two  maintenance  approaches,  we  constructed  a 
Monte  Carlo  model  to  simulate  the  operation  of  a  system  composed  of  300  platforms. 
The  model  computes  the  reliability  of  such  a  set  of  multiple  platforms,  where  each 
platform  is  composed  of  a  number  of  identical  critical  parts,  operated  over  a  given 
mission  period,  with  a  specified  part-replacement  strategy. 

To  establish  a  baseline,  the  system  of  300  platforms  was  first  simulated  in  the 
replace-as-needed  mode,  with  instantaneous  replacement  of  failed  pails.  We  then  ran  the 
simulation  for  the  simultaneous-replacement-based  maintenance  strategy,  using  various 
simultaneous-replacement  frequencies.  Finally,  we  ran  the  simulation  for  the  prognostics- 
based  maintenance  strategy,  using  various  prognostic  ratios  (the  ratio  of  the  number  of 
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critical  pails  for  which  the  remaining  life  can  be  predicted  to  the  total  number  of  critical 
pails). 

We  evaluated  each  strategy  in  terms  of  administrative  and  logistic  delay  time  and 
operational  availability,  among  other  parameters. 

CONCLUSIONS 

The  simultaneous  replacement  of  all  mission-critical  parts  increases  operational 
availability  over  a  specified  period  of  time.  It  can  be  done  at  regular  intervals  or  just 
before  an  upcoming  mission  to  increase  the  system  pulse  reliability.  Frequent 
simultaneous  replacements  as  a  maintenance  strategy  will  lead  to  an  increased  average 
operational  availability  over  extended  time  periods  even  for  low-reliability  systems,  but  it 
will  also  result  in  significant  costs,  the  result  of  replacing  most  parts  prematurely  and 
underutilizing  their  service  life.  While  the  risk  of  failure  is  reduced  because  the  platform 
now  has  new  parts,  the  replaced  parts  are  thrown  away  even  though  most  of  them  still 
have  some  (in  many  cases  substantial)  useful  life  left. 

The  prognostics  approach  is  a  more  effective  way  to  maintain  desired  operational 
availability.  It  allows  reduced  administrative  and  logistic  delay  times  by  anticipating 
upcoming  failures  and  preparing  the  necessary  parts  in  advance.  In  addition,  the 
prognostics  capability  allows  for  intelligent  maintenance — replacing  only  those  parts 
whose  remaining  lifetime  has  reached  a  critical  value.  The  prognostics  approach, 
therefore,  allows  full  utilization  of  each  pail’s  service  life;  this  operational  availability 
increase  is  obtained  at  significantly  lower  cost  (number  of  spares)  than  that  of  the 
frequent  simultaneous-replacement  maintenance  strategy. 

One  of  the  main  operational  benefits  of  the  prognostics  approach  is  that  it  leads  to 
potentially  failure-free  missions  because  it  allows  field  commanders  to  select  only  those 
platforms  whose  operational  availability  exceeds  the  duration  of  the  upcoming  mission. 
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I.  INTRODUCTION 


Research  presented  in  this  paper  is  motivated  by  the  question  of  how  to  achieve 
high  reliability  and  operational  availability  of  complex  systems.  Significant  operational 
benefits  are  to  be  gained  by  employing  highly  reliable  military  assets.  A  brief  history  of 
U.S.  military  reliability  objectives  and  major  reasons  why  achieving  high  reliability  is  so 
important  are  described  in  [1]  and  summarized  here  as  follows:  (1)  maintaining  weapons 
systems  consumes  a  significant  portion  of  the  total  defense  budget;  (2)  mission  reliability 
is  a  key  factor  in  determining  system  effectiveness;  (3)  the  sustainable  level  of  system 
readiness  is  in  large  measure  determined  by  its  reliability  and  maintainability 
characteristics. 

In  the  current  effort  on  developing  future  combat  systems  (FCS)  family  of 
military  vehicles,  achieving  high  operational  availability  is  one  of  the  major  objectives 
[2].  To  this  end,  reliability  requirements  of  the  new  military  platforms  have  been 
significantly  increased.  To  achieve  increased  reliability  and  operational  availability, 
development  of  a  prognostics-based  approach  that  will  utilize  “an  embedded  mission 
readiness  system”  is  proposed.  “This  system  will  monitor  the  status  of  mission  critical 
parts... The  embedded  readiness  system  will  include  the  capability  to  forecast  the  future 
state  of  the  FCS  system...”  [ref.  2,  pp.  58-59]. 

In  this  paper,  we  examine  ways  to  improve  reliability  of  the  complex  systems  and 
the  relationship  between  a  system’s  reliability  and  its  operational  availability.  We 
investigate  in  detail  two  approaches  for  achieving  and  maintaining  high  operational 
availability  of  military  systems:  simultaneous  replacement  of  mission-critical  parts  and 
prognostics  asset-management  strategies. 
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II.  PROBLEM  FORMULATION 


A.  SYSTEM  RELIABILITY 

Consider  a  military  system  (e.g.,  a  vehicle)  with  a  number  of  mission-critical 
parts.  The  parts  are  mission  critical  in  the  sense  that  failure  of  one  of  the  parts  results  in 
overall  system  being  unavailable  for  use.  We  are  interested  in  estimating  the  mission 
availability  of  this  system  based  on  the  reliability  of  its  parts.  In  our  analysis,  we  assume 
that  the  parts  operate  independently  (i.e.,  operation  and  failure  of  one  of  the  parts  does 
not  affect  the  operation  of  other  parts).  We  also  assume  that  the  parts  are  connected  in 
series,  meaning  that  if  one  or  more  parts  fail,  the  system  will  no  longer  be  operational. 

The  reliability  of  such  a  system  with  n  components  can  be  found  as  follows: 

R(t)  =  (1  -Fx  (*))  •  (1  -  F2 (t))  •  •  •  (1  -Fn (t))  =  fl(l-F  (0)  (1) 

i=l 

where  F.(t)  is  the  cumulative  failure  distribution  function  for  ith  part: 

t 

Fi=\fi(t)dt 

0  (2) 

and  /,(f)is  the  zth  part  life’s  probability  density  distribution.  The  overall  system 
reliability,  R(t),  can  in  some  cases  be  expressed  analytically  in  a  closed  form.  For 
example,  when  the  individual  parts  fail  exponentially  (i.e.,  ft{t)  =  T,  exp(-T,t) ,  T, being 
the  part  failure  rate),  the  system  reliability  can  be  expressed  as  R(t)  =  exp  (-At),  where 

-  n 

A  =  ^ At  .  In  general,  however,  the  expression  for  /'  (/) depends  on  the  operating 
conditions  and  failure  mechanisms  of  the  part,  and  equation  (1)  can  only  be  solved 
numerically. 

B.  OPERATIONAL  AVAILABILITY 

In  addition  to  considering  the  system  reliability,  we  also  investigate  the  effect  of 
prognostics  on  operational  availability  (A0),  which  is  defined  as  the  ratio  of  the  time  the 
system  was  available  for  operation  to  the  total  mission  time  [3]: 


II- 1 


(3) 


where  Td  is  the  system  down  time  and  Tm  is  the  total  mission  time.  The  down  time  of  the 

system  is  determined  by  the  number  of  system  failures  during  the  mission  (i.e.,  its 
reliability)  and  the  time  it  takes  to  bring  the  system  back  to  operational  status.  Repair 
times  and  simultaneous-replacement  policies  have  a  significant  impact  on  A0.  A  detailed 
discussion  on  this  subject  is  given  in  [4]. 

C.  PARTS  REPLACEMENT  STRATEGIES 

1.  Simultaneous  Replacement  of  Mission- Critical  Parts 

The  strategy  of  simultaneous  replacement  of  mission- critical  parts  is,  as  its  name 
suggests,  the  periodic,  complete,  and  simultaneous  replacement  of  only  those  parts  that 
cause  the  system  (or  vehicle)  to  be  inoperable  if  one  or  more  of  them  fail.  (We  refer  to 
this  strategy  as  “simultaneous  replacement”  hereinafter.)  To  evaluate  the  effect  of 
simultaneous  replacement  on  the  vehicle  operational  characteristics,  consider  a 
maintenance  policy  with  an  operational  time  between  simultaneous  replacements  equal  to 
T.  At  time  l  =  T,  the  system  reliability  drops  to  R(T),  at  which  point  the  system-critical 
parts  are  replaced  by  new  parts.  Reliability  just  before  the  simultaneous  replacement  can 
be  determined  as  follows: 

T 

w)=na-ji/,  ( t)dt ) 

'■=1  0  (4) 

Since  the  replaced  parts  are  assumed  to  be  as  good  as  new,  the  system  reliability 
just  after  the  simultaneous  replacement  is  brought  back  to  1.  The  frequent-simultaneous- 
replacement  approach  may  be  a  way  to  maintain  high  average  system  reliability,  but  it 
could  lead  to  prohibitively  high  life-cycle  costs  because  it  results  in  many  parts  being 
replaced  before  the  end  of  their  useful  service  life. 

2.  Prognostics 

A  discussion  of  the  prognostics  approach  to  maintenance  of  military  systems  and 
a  description  of  a  current  DARPA  effort  focused  on  developing  prognostics  technologies 
can  be  found  in  [5].  An  example  of  a  prognostics  approach  applied  to  helicopter 
propulsion  is  described  in  [6].  Three  basic  prognostics  strategies  can  be  distinguished:  the 
traditional  risk-based  approach,  failure-precursor-based  approach,  and  physics-of-failure 
approach  [7]. 

The  risk-based  approach  is  based  on  the  service  life  distribution  of  a  pail 
(assumed  or  known  from  its  operational  history).  When  the  risk  of  operational  failure 
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reaches  a  typically  very  small  number,  the  system  operation  is  stopped,  and  the  part  is 
replaced.  (This  approach  is  routinely  used  on  aircraft  for  life-critical  parts.)  Figure  II- 1  is 
a  schematic  description  of  this  approach. 

I  | 
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Damage  distribution 

/  Risk  of  failure 
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Critical  damage  /  , 

,'1 
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IS 

Time 


Figure  11-1.  Risk-Based  Approach 


The  risk  of  failure  in  this  case  can  be  approximated  by  the  area  under  the  curve, 
which  describes  a  probability  density  distribution  of  the  damage  reaching  critical  size. 
Such  a  curve  can,  for  example,  describe  a  distribution  of  possible  crack  sizes  at  a  certain 
location  of  a  disc  or  blade  in  a  turbine  jet  engine  after  a  number  of  flights.  As  the  number 
of  flights  increases,  the  distribution  of  possible  crack  sizes  changes,  reflecting  increased 
probability  of  larger  cracks.  Eventually,  there  is  a  small  but  finite  probability  that  the 
crack  has  reached  a  critical  length,  which  is  determined  by  material  properties  and 
upcoming  loading  conditions.  The  inspection  interval  times  are  set  so  that  the  pails  are 
inspected  just  before  the  crack  reaches  critical  size.  The  part  is  replaced  in  the  event  a 
defect  is  found  or  the  pail  had  flown  a  certain  number  of  cycles,  even  if  there  is  no  defect 
found. 


Figure  II-2  is  a  schematic  description  of  a  precursor-based  approach.  It  is 
applicable  to  situations  when  there  is  a  degradation  mechanism  that  can  be  observed  and 
detected  by  a  sensor.  At  some  point  during  the  vehicle  operation,  the  damage-detection 
threshold  is  reached,  indicating  that  damage  has  reached  some  level  detectable  by  a 
sensor.  In  cases  when  this  precursor  time  (i.e.,  the  time  between  the  progression  of  the 
damage  from  detectable  to  critical  level)  is  long  enough  to  plan  ahead  for  the 
maintenance  action,  this  is  a  useful  approach.  But  in  cases  where  a  failure  is  detected  just 
before  it  happens,  the  practical  applicability  is  limited. 
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Figure  11-2.  Precursor-Based  Approach 


Figure  II-3  depicts  the  physics-of-failure  approach,  an  approach  that  has  been 
widely  reported  in  the  literature  (examples  of  applications  of  this  approach  to  mechanical 
systems  can  be  found  in  [7]  and  [8]).  In  this  approach,  it  is  assumed  that  structural  loads 
are  continuously  monitored,  and  damage  progression  is  modeled  by  a  physics-of-failure 
model  and  validated  by  on-  or  off-line  sensors.  The  physics-of-failure  model  predicts  the 
state  of  damage  and  the  remaining  life  for  an  assumed  loading  during  an  upcoming 
mission. 
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Figure  11-3.  Physics-of-Failure  Approach 
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In  this  paper,  we  assume  that  the  prognostics  capability  allows  us  to  know  the 
current  damage  state  of  the  part  and  its  remaining  life.  That  is,  we  can  deterministically 
predict  when  the  pail  is  going  to  fail.1 

The  prognostics  approach  allows  a  flexible  maintenance  strategy — replacing  the 
pail  just  before  is  fails  or  at  some  predetermined  time  interval  before  failure,  which  may 
be  based  on  the  upcoming  mission  duration  or  the  availability  of  mechanics  and  spare 
parts.  For  example,  if  it  is  desired  that  no  failures  should  take  place  during  an  upcoming 
mission,  all  parts  with  remaining  life  less  than  the  upcoming  mission  duration  time  would 
be  replaced. 

D.  INVESTIGATION  OBJECTIVES  AND  APPROACH 

The  goal  of  the  reported  investigation  is  to  evaluate  the  effect  of  prognostics  on  a 
system’s  reliability  and  its  operational  availability  and  compare  those  practices  with 
traditional  maintenance  policies.  To  this  end,  we  implemented  a  Monte  Carlo  modeling 
approach  based  on  the  following  assumptions.  First,  we  assumed  that  the  platforms  (or  a 
set  of  military  vehicles)  under  investigation  have  critical  parts  with  the  same  failure 
probability  density  function.  Second,  we  assumed  that  the  critical  paid’s  service  life  has  a 
Gaussian  distribution.  This  choice  is  motivated  by  our  goal  to  highlight  the  differences 
between  the  simultaneous  replacement  and  prognostics  approaches  and  to  investigate  the 
potential  benefits  of  prognostics.  To  ensure  realistic  lifetime  values,  we  used  data  from 
real-time  military  exercises  to  estimate  the  mean  and  standard  deviation  of  the  paid’s 
lifetimes  [9]. 


1  This  is  obviously  an  idealized  situation.  In  reality,  there  will  be  uncertainties  associated  with  the 
upcoming  mission  conditions,  operating  environment,  predictive  models,  and  sensors.  Quantification 
of  these  uncertainties  is  outside  of  the  scope  of  this  investigation. 
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III.  SIMULATION  DETAILS 


A.  DESCRIPTION 

We  constructed  a  Monte  Carlo  model  to  simulate  the  operation  of  a  system 
composed  of  a  large  number  of  platforms.  The  model  computes  the  reliability  for  such  a 
set  of  multiple  platforms,  where  each  platform  is  composed  of  a  number  of  identical 
critical  parts  and  is  operated  over  a  given  mission  period  with  a  specified  part- 
replacement  strategy.  Platform  failure  (mission  abort)  takes  place  when  any  one  of  the 
critical  parts  fails.  The  part  failures  are  independent,  and  they  have  randomly  distributed 
lifetimes.  We  investigated  three  part-replacement  strategies:  (1)  replace  as  needed  (our 
baseline),  (2)  simultaneous  replacement  at  a  specified  interval,  and  (3)  prognostics. 


B.  PART  FAILURE  LIFE  DISTRIBUTION 


In  the  model,  any  distribution  may  be  used  to  describe  the  probability  of  part 
failure  as  function  of  time.  For  this  study,  we  chose  the  Gaussian  distribution: 


P(x)  = 


1 


a/Ot)o- 


cxpl 


-(x-pY 

2a2 


(5) 


P(x)  is  the  probability  that  a  part  has  lifetime  x  given  the  mean  part  lifetime  //  and 
standard  deviation  a.  The  part  lifetime  //is  a  variable  parameter,  whose  effect  on 
reliability  and  operational  availability  we  investigate.  The  standard  deviation  a  is  held 
fixed  at  0.25  percent  of  the  mean  part  lifetime  because  variation  of  crhas  only  minor 
effect  on  A0  (see  Appendix  A). 


C.  IMPLEMENTATION  OF  PART-REPLACEMENT  STRATEGIES 

In  the  replace- as -needed  maintenance  strategy,  parts  are  replaced  as  they  fail,  and 
the  platform  incurs  repair  time  and  administrative  and  logistic  delay  time  (ALDT).  The 
ALDT  and  repair  time  are  randomly  chosen  from  a  Gaussian  distribution  with  user- 
specified  mean  and  standard  deviation  (The  values  for  these  distributions  arc  chosen  from 
the  data  for  Army  platforms  [9].) 

For  the  simultaneous-replacement  maintenance  strategy,  all  parts  are  replaced  at 
the  same  time  at  user-specified  time  intervals.  No  ALDT  associated  with  pari  delivery 
occurs  because  it  is  assumed  that  all  the  parts  arc  prepared  beforehand.  Thus,  only  repair 


III-l 


times  are  randomly  chosen  from  a  Gaussian  distribution.  It  is  also  assumed  that  all  parts 
are  repaired  within  the  time  it  takes  to  fix  the  part  with  the  longest  repair  time.  In 
addition,  if  during  the  platform  operation  a  part  fails  before  the  scheduled  simultaneous- 
replacement  interval  occurs,  the  replace-as-needed  strategy  is  applied  to  that  particular 
pail. 

In  the  prognostics  mode,  the  residual  life  of  the  part  is  assumed  known  in  every 
instance.  The  pails  could  be  repaired  at  any  time  during  the  service.  In  this  investigation, 
the  replacement  time  is  chosen  to  be  just  before  the  pail  fails.  No  ALDT  is  incurred.  The 
repair  time  is  randomly  sampled  from  a  Gaussian  distribution. 

D.  SIMULATION  INPUT  PARAMETERS 

Table  III- 1  gives  all  the  model  input  parameters  used  in  the  simulations. 


Table  MI-1.  Input  Parameters 


Parameter 

Value 

Operation  time  in  years 

2 

Number  of  platforms 

300 

Number  of  parts 

5,  10,  15,  20 

Mean  part  lifetime  in  years 

vi,  y2,  i ,  2 

Part  lifetime  standard  deviation  as  a  percentage  of  mean  lifetime 

5,  10,  25,  50 

All  parts  have  the  same  failure  distribution 

Gaussian 

Part  repair  time  in  hours  (uncertainty) 

4  (5%) 

Part  delivery  time  in  hours  (uncertainty) 

12  (80%) 

Simultaneous-replacement  time  interval  as  a  percentage  of  part  lifetime 

50,  75,  90 

Percentage  of  parts  with  prognostics  capability 

25,  50,  75,  100 
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IV.  SIMULATION  RESULTS 


A.  REPLACE  AS  NEEDED 

The  system  of  300  platforms  was  first  simulated  in  the  replace-as-needed  mode, 
with  instantaneous  replacement  of  the  failed  parts.  The  upper  plot  of  Figure  IV- 1  is  a 
histogram  of  the  times  between  platform  failures.  These  results  can  be  described  by  an 
exponential  behavior  with  a  reasonable  accuracy:  An  exponential  fit  of  the  histogram  is 
obtained  when  the  meantime  between  failure  (MTBF)  of  this  system  is  10.0  ±0.1  days. 
The  MTBF  is  obtained  by  performing  an  exponential  fit  to  the  histogram,  and  the 
uncertainty  is  taken  from  the  fit  error.  The  observed  system  reliability  behavior  is  in 
agreement  with  previous  findings  by  Drenick  [10],  who  showed  that  under  appropriate 
conditions,  complex  systems  fail  in  an  exponential  manner  even  though  their  individual 
parts  behave  according  to  other  types  of  failure  distributions.  This  happens  when  the 
system  consists  of  a  large  number  of  parts,  the  parts  are  connected  in  series,  and  the  pail 
behavior  is  uncorrelated.  The  significance  of  this  statement  is  that  the  reliability  of  the 
system  under  consideration  can  therefore  be  described  by  one  parameter,  MTBF,  or  its 
inverse,  the  failure  rate,  X.  It  is  therefore  a  reasonable  practice  for  military  platforms, 
which  arc  systems  with  large  numbers  of  parts. 

The  lower  plot  of  Figure  IV- 1  depicts  the  behavior  of  the  times  between  failure  of 
a  system  with  the  added  Gaussian-distributed  ALDT  and  repair  times. 

Figure  IV-2  shows  the  effect  of  decreasing  the  number  and  increasing  the  lifetime 
(or  reliability)  of  individual  parts  on  the  total  system  reliability — the  well-known  design 
strategy  for  improving  system  reliability.  As  expected,  as  the  pail  lifetime  increases,  the 
average  platform  MTBF  also  increases.  The  effect  is  more  pronounced  for  a  small 
number  of  parts.  The  error  on  the  average  MTBF  for  platforms  composed  of  smaller 
numbers  of  parts  with  longer  lifetime  is  larger  than  that  of  platforms  with  larger  numbers 
of  parts  due  to  a  less  accurate  fit  to  an  exponential  distribution.  For  platforms  composed 
of  large  numbers  of  parts,  even  substantial  improvement  in  the  lifetime  of  individual  parts 
results  in  only  marginal  increase  in  system  reliability.  The  significance  of  this  result  is 
that  a  considerable  design  effort  should  be  focused  on  reducing  the  number  of  reliability- 
critical  parts. 
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Figures  IV-1.  Time  Between  Failures 

The  upper  plot  is  a  histogram  of  the  time  between  platform  failures  without  logistics 
delays  for  a  system  of  300  platforms,  each  composed  of  20  critical  parts  with  Vi-year 
lifetimes.  The  lower  plot  is  the  time  between  failures  with  logistics  delay  that  includes  both 

delivery  and  repair  times. 

In  real  operational  environments,  delays  in  bringing  a  system  back  to  operational 
status  are  primarily  determined  by  ALDT,  and  so  the  system  behavior  can  be  described 
by  its  operational  availability.  Figure  IV- 3  depicts  the  operational  availability  as  a 
function  of  time  for  300  platforms,  each  composed  of  20  critical  parts.  The  operational 
availability  starts  at  1  in  the  beginning  of  the  platform’s  service,  and  then  it  exhibits  a 
periodic  behavior  eventually  approaching  some  asymptotic  value.  The  instantaneous  Aa 
can  be  averaged  over  a  2-year  mission  time  to  yield  0.48  ±  0.02. 
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Figure  IV-2.  System  MTBF  vs.  Part  Lifetime 

Average  MTBF  for  a  system  of  300  platforms  as  a  function  the  number  and  lifetime  of  the 

critical  parts  that  make  up  each  platform. 


Figure  IV-3.  Operational  Availability  vs.  Time — with  ALDT 
Instantaneous  operational  availability,  A0(t),  for  a  system  of  300  platforms  each  composed 
of  20  critical  parts  with  a  lifetime  of  Vz  year  as  a  function  of  time. 


Figure  IV-4  shows  the  effect  of  the  number  and  reliability  of  the  individual  parts 
on  the  average  platform  operational  availability.  As  the  reliability  of  the  parts  increases, 
the  operational  availability  also  increases.  In  the  limit  of  very  long  paid  lifetimes,  the 
operational  availability  approaches  1.  These  results  demonstrate  that,  like  the 
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instantaneous-replacement  case  (no  repair  and  no  ALDT),  a  significant  tradeoff  between 
number  of  parts  and  their  reliability  is  preserved  in  a  situation  when  ALDT  and  repair 
times  apply. 


1.0 
^0.9 
=  0.8 
=  0.7 

Jo.6 

^0.5 
1 0.4 
'1 0.3 
8.0.2 
°0.1 
0.0 


Figure  IV-4.  Part  Lifetime  vs.  Operational  Availability 
Average  operational  availability  for  a  system  of  300  platforms  as  a  function  the  number 
and  lifetime  of  the  critical  parts  that  make  up  each  platform. 

Figure  IV- 5  shows  the  effect  of  reducing  ALDT  on  operational  availability.  Even 
with  a  relatively  low  part  lifetime,  the  operational  availability  can  be  substantially 
improved  by  reducing  ALDT.  This  means  that  while  it  might  be  very  difficult  to  improve 
reliability  of  a  platform  itself  by  design,  reducing  its  ALDT  might  be  less  of  a  challenge, 
leading  to  a  practical  way  to  achieve  substantial  improvement  in  operational  availability. 

B.  SIMULTANEOUS-REPLACEMENT  STRATEGY 

One  of  the  possible  strategies  to  improve  operational  availability  is  to  replace  the 
system-critical  parts,  all  at  the  same  time,  at  regular  intervals.  Figure  IV-6  depicts  results 
of  the  simulation  of  a  system  with  a  simultaneous-replacement  frequency  of  137  days 
(three-fourths  of  the  part’s  lifetime).  Under  those  conditions,  the  average  operational 
availability  over  a  2-year  time  period  is  0.71  ±  0.05,  a  significant  improvement  over  0.48 
±  0.02,  the  operational  availability  of  a  similar  system  without  simultaneous  replacements 
(Figure  IV-3). 
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Figure  IV-5.  ALDT  vs.  Operational  Availability 
Average  operational  availability  as  a  function  of  critical  part  lifetime  for  various  ALDTs  for 
a  system  of  300  platforms,  each  with  20  critical  parts. 


Figure  IV-6.  Operational  Availability  vs.  Time — 75%  Lifetime  Simultaneous- 

Replacement  Frequency 

Instantaneous  operational  availability  with  a  platform  simultaneous-replacement 
frequency  of  137  days  (75%  of  the  lifetime  of  the  critical  parts  that  make  up  the  platform) 
for  a  system  of  300  platforms  as  a  function  of  time.  Each  platform  is  composed  of  20 
critical  parts  with  a  mean  and  standard  deviation  of  182.5  ±  45.6  days. 
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Figure  IV- 7  shows  the  effect  of  increasing  the  simultaneous-replacement 
frequency  to  one-half  of  the  part’s  lifetime.  In  this  case,  A0  =  0.90  ±  0.05.  Increasing  this 
frequency  leads  to  a  considerable  improvement  in  operational  availability. 


Figure  IV-7.  Operational  Availability  vs.  Time — 50%  Lifetime  Simultaneous- 

Replacement  Frequency 

Instantaneous  operational  availability  with  a  platform  simultaneous-replacement 
frequency  of  91.25  days  (50%  of  the  lifetime  of  the  critical  parts  that  make  up  the  platform) 
for  a  system  of  300  platforms  as  a  function  of  mission  time. 

While  the  simultaneous-replacement  strategy  does  lead  to  high  operational 
availability,  it  is  important  to  recognize  that  the  penalty  for  this  approach  is  the  high  cost 
associated  with  replacing  parts  too  often,  thus  underutilizing  their  service  lives.  A  series 
of  simulations  were  performed  to  illustrate  the  effect  of  increasing  the  simultaneous- 
replacement  frequency  on  life-cycle  costs.  In  this  case,  the  life-cycle  cost  is  assumed  to 
be  proportional  to  the  number  of  parts  changed  during  the  platform  operation. 

Figure  IV-8  depicts  the  operational  availability  of  a  system  of  platforms  as  a 
function  of  part  lifetime;  also  shown  is  the  number  of  spare  parts  required  to  maintain 
that  system.  For  a  platform  with  a  given  number  of  parts  and  their  lifetimes,  a 
maintenance  strategy  could  be  chosen  based  on  desired  operational  availability  and  cost 
constraints.  If  no  simultaneous  replacement  is  performed,  that  is,  if  the  system  is  allowed 
to  operate  until  the  pail’s  failure  (with  a  subsequent  part  replacement),  the  incurred  cost 
is  at  a  minimum,  but  operational  availability  is  also  limited.  In  particular,  only  platforms 
with  very  reliable  parts  (with  mean  lifetimes  greater  than  500  days)  can  achieve  an 
operational  availability  over  75%. 
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Figure  IV-8.  Operational  Availability  vs.  Part  Lifetime — Different  Simultaneous- 

Replacement  Frequencies 

Average  operational  availability  and  the  average  number  of  replaced  parts  are  shown  as 
functions  of  part  lifetime  for  a  system  of  300  platforms,  with  parts  replaced  at 
simultaneous-replacement  intervals  of  50%,  75%,  and  90%  of  the  critical  part  lifetime. 

To  improve  the  operational  availability,  a  higher  simultaneous-replacement 
frequency  may  be  chosen.  As  this  frequency  increases,  even  the  platforms  with  relatively 
low  part  lifetimes  can  reach  a  high  operational  availability.  For  example,  if  an  operational 
availability  greater  than  90%  is  desired  for  a  platform  composed  of  parts  with  lifetimes  of 
180  days,  the  simultaneous  replacement  could  be  performed  every  90  days  (50%  of  the 
part’s  lifetime).  In  this  case,  however,  the  cost  (the  total  number  of  parts  that  are 
replaced)  of  such  a  maintenance  strategy  is  considerably  higher  than  that  for  a  platform 
with  a  part  lifetime  of  360  days. 

To  put  these  results  into  perspective,  consider  the  reliability  of  current  military 
platforms.  The  Stryker  Infantry  Carrier  Vehicle  has  an  MTBF  of  approximately  167 
hours  (~  7.0  days)  [11].  This  is  equivalent  to  a  simulated  system  having  20  critical  parts, 
each  with  a  Vi-year  lifetime.  The  Bradley  Fighting  Vehicle  has  an  MTBF  of  133  hours,  or 
about  5.5  days  [11],  which  is  equivalent  to  a  simulated  system  with  20  critical  parts,  each 
having  14-  to  '/2-year  lifetimes.  The  M1A2  Abrams  main  battle  tank  has  an  MTBF  of  27 
hours  [11],  which  is  equivalent  to  a  simulated  system  with  30  critical  parts,  each  having  a 
1 -month  lifetime.  The  implication  is  that  to  maintain  a  high  operational  availability  for 
platforms  with  those  relatively  low  reliabilities  over  long  mission  times  would  require 
frequent  simultaneous  replacements  and  significant  cost.  If  the  mission  times  are  short,  it 
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is  possible  to  reduce  the  costs  by  performing  the  simultaneous  replacements  just  before 
the  mission,  a  so-called  “pulse  reliability”  improvement.  Note,  however,  that  in  this  case 
the  average  operational  availability  over  an  extended  period  of  time  will  still  be  low. 

C.  THE  EFFECT  OF  PROGNOSTICS 

The  prognostics  capability  in  this  investigation  is  described  by  the  prognostics 
ratio,  p,  defined  as  a  ratio  of  the  number  of  critical  parts  for  which  the  remaining  life  can 
be  predicted  to  the  total  number  of  critical  parts.  Note  that  in  general  it  is  unlikely  that  all 
failures  in  a  platform  can  be  predicted.  For  example,  most  of  the  electronic  parts  fail 
randomly  and  not  according  to  a  wear-out  law;  hence  no  physics-of-failure  prognostics 
can  be  implemented  for  those  parts. 

The  benefits  of  the  prognostics  capability  are  twofold.  When  a  pail  failure  is 
anticipated,  in  the  sense  that  its  remaining  life  is  known,  the  replacement  part  can  be 
ordered  in  advance,  reducing  ALDT  essentially  to  zero.  This  has  a  pronounced  effect  on 
improving  operational  availability.  In  addition,  knowing  the  time  to  failure  allows 
military  commanders  in  the  field  to  choose  platforms  that  are  theoretically  capable  of 
performing  failure-free  during  the  upcoming  mission,  as  well  as  to  schedule  maintenance 
when  spare  parts  and  resources  are  available. 

Figure  IV-9  demonstrates  the  effect  of  prognostics  on  operational  availability  for 
systems  with  various  pail  lifetimes.  Note  that  the  part  lifetime  plays  an  important  role, 
even  when  the  prognostics  capability  is  implemented.  For  example,  a  Stryker  vehicle 
with  an  MTBF  of  167  hours  is  equivalent  to  a  simulated  system  of  20  critical  parts  each 
with  a  ‘/2-year  lifetime.  The  operational  availability  of  such  a  system  is  about  50%.  Now 
if  prognostics  were  to  be  implemented  on  12  of  20  of  those  parts  (p  =  0.6),  the 
operational  availability  of  such  a  system  would  substantially  increase  to  70%.  The 
significance  of  this  result  is  that  the  prognostics  approach  by  itself  should  not  be 
considered  a  way  to  improve  vehicle  reliability.  Rather,  the  prognostics  approach  is  a 
means  to  improve  operational  availability,  which  is  achieved  by  anticipating  paid  failures 
and  improving  efficiencies  in  replacing  parts.  Even  with  prognostics,  however,  the 
operational  availability  is  limited  by  the  inherent  system  reliability,  which  is  a  function  of 
vehicle  design  and  operating  conditions. 
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Figure  IV-9.  Prognostic  Effect  on  Operational  Availability 
Average  operational  availability  as  a  function  of  the  fraction  of  critical  parts  with 
prognostics  for  a  system  of  300  platforms  each  with  20  critical  parts. 

Figure  IV- 10  compares  the  number  of  required  replacement  parts  needed  to 
maintain  a  system  at  a  desired  operational  availability  when  prognostics  are  applied  with 
the  number  needed  when  simultaneous  replacement  is  applied.  The  prognostics  part- 
replacement  strategy  produces  higher  operational  availability  values  as  the  prognostic 
ratio,  p,  and  increases  from  0  to  1  in  steps  of  0.25.  The  simultaneous-replacement  strategy 
produces  higher  operational  availability  values  as  the  simultaneous-replacement 
frequency  increases,  starting  from  no-simultaneous-replacement  case  and  then  increasing 
from  90%,  to  75%,  and  to  50%  of  the  critical  part  lifetime.  Note  that  in  both  cases, 
additional  parts  are  needed  to  maintain  low-reliability  systems  at  high  operational 
availability.  However,  the  prognostics  strategy  requires  fewer  replacement  parts.  For 
systems  composed  of  less  reliable  parts,  this  difference  is  substantial.  Therefore,  using 
prognostics  is  a  much  more  effective  and  potentially  less  expensive  approach  to 
achieving  high  operational  availability. 
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Figure  IV-10.  Simultaneous  Replacement  or  Prognostics — 

Operational  Availability  vs.  Number  of  Parts 
Average  operational  availability  as  a  function  of  the  number  of  replaced  parts  and  their 
lifetimes  for  the  simultaneous  replacement  and  prognostic  part-replacement  strategies  for 
a  system  of  300  platforms  composed  of  20  parts  each.  The  solid  lines  have  5  points 
indicating  the  prognostic  ratios  of  0%,  25%,  50%,  75%,  and  100%  (moving  from  lower  left 
to  upper  right).  The  dashed  lines  represent  the  frequency  of  system  simultaneous 
replacements  based  on  the  percentage  of  the  critical  parts  total  lifetime.  Moving  from 
lower  left  to  upper  right,  the  four  points  are,  no  simultaneous  replacement  performed,  90% 
of  part  life,  75%  of  part  life,  and  50%  of  the  part  life. 
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V.  CONCLUSIONS 


1.  The  reliability  of  complex  systems  with  many  independently  operating  parts 
connected  in  series  can  be  significantly  improved  by  reducing  the  number  of 
critical  parts.  Improving  the  individual  part  reliability  also  leads  to  an  increase 
in  the  overall  system  reliability,  but  this  effect  is  less  pronounced  for  systems 
with  a  large  number  of  parts. 

2.  Operational  availability  of  the  military  vehicles  is  significantly  affected  by 
ALDT  and  repair  times.  Improving  the  maintenance  practices,  that  is, 
decreasing  the  delay  and  repair  times,  is  an  effective  means  of  improving  the 
operational  availability. 

3.  Performing  simultaneous  replacements  increases  operational  availability  over 
a  specified  operational  time.  It  can  be  done  at  regular  intervals  or  just  before 
an  upcoming  mission  to  increase  the  system  pulse  reliability.  Frequent 
simultaneous-replacement  maintenance  will  lead  to  an  increased  average 
operational  availability  over  extended  time  periods  even  for  low-reliability 
systems,  but  it  will  also  result  in  substantial  costs,  the  result  of  replacing  many 
parts  and  underutilizing  the  their  service  life. 

4.  The  prognostics  approach  allows  reducing  ALDT  and  improving  operational 
availability  by  anticipating  failure  and  preparing  the  necessary  replacement 
parts.  In  addition,  the  prognostics  capability  allows  for  intelligent 
maintenance — replacing  only  those  parts  whose  remaining  lifetime  has 
reached  a  critical  (predetermined)  value.  In  this  case,  the  operational 
availability  increase  is  obtained  at  significantly  lower  cost  (in  terms  of  the 
number  of  spares)  than  that  of  the  simultaneous-replacement  maintenance 
strategy.  The  prognostics  approach  can  in  theory  lead  to  failure-free  missions 
because  it  allows  field  commanders  to  select  only  those  platforms  whose 
operational  availability  exceeds  the  duration  of  the  upcoming  mission. 
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APPENDIX  A 

THE  EFFECT  OF  THE  PART  LIFETIME  UNCERTAINTY  ON 
OPERATIONAL  AVAILABILITY 


Figure  A-1.  Effect  of  Part  Lifetime  Uncertainty  on  Operational  Availability 
Effect  of  increasing  the  fractional  uncertainty  of  the  critical  part  lifetime  (i.e.,  ratio  of  the 
standard  deviation  to  the  mean  lifetime)  on  Mean  Time  Before  Failure  and  Operational 
Availability  for  a  system  of  300  platforms,  each  made  up  of  20  critical  parts  with  1/2-year 

lifetime. 

The  results  in  Figure  A-1  show  that  the  average  operational  availability  is  only 
weakly  dependent  upon  the  uncertainty  in  the  critical  part  lifetime.  Instead,  the  average 
operational  availability  is  dependent  on  the  total  number  of  part  failures  for  a  given 
mission  time,  not  how  far  apart  the  failures  are  in  time.  In  the  tests  performed  in  this 
study,  the  mission  time  is  long  compared  with  the  part  lifetime.  So  even  as  the  lifetime 
uncertainty  gets  broader,  the  total  number  of  failed  parts  over  the  mission  remains  fixed, 
and  operational  availability  stays  constant.  In  addition,  the  mean  time  between  failure 
gets  longer  as  the  fractional  error  on  the  lifetime  is  increased.  This  is  due  to  the 
broadening  of  the  time  before  failure  distribution,  such  as  Figure  IV-2,  which  depends 
upon  the  broadening  of  the  underlying  lifetime  distribution  for  each  part.  But  the 
spreading  of  the  underlying  lifetime  distribution  also  forces  the  system  failures  to  occur 
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at  increasingly  earlier  times  than  would  occur  with  parts  with  smaller  uncertainty  in  their 
lifetimes. 
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